This presents the renewed analysis of Cryptococcus neoformans start codon usage and context. This uses the best-transcript annotation and corresponding start codon position and sequence map made by Corinne Maufrais in June 2018.
It covers both JEC21 and H99 data. First several analyses on JEC21, then the same analyses on H99, then a joint analysis of signals conserved across both strains.
We check consensus sequences for both “narrow” (NNNNNATG) and “wide” (NNNNNNNNNATGNNN) neighbourhoods of the start codon, and find essentially the same results with both, comparing annotated aATGs to downstream dATGs. Then for the following analyses we use the narrow score.
## # A tibble: 6,639 x 4
## # Groups: Gene [6,639]
## Gene RNA RPF TE
## <chr> <dbl> <dbl> <dbl>
## 1 CNM01300 4001. 18299. 4.57
## 2 CNM01080 8388. 8973. 1.07
## 3 CNA07570 5785. 7163. 1.24
## 4 CNG04360 3188. 7062. 2.22
## 5 CNB02360 3708. 6972. 1.88
## 6 CNA06350 15095. 6604. 0.437
## 7 CNC00700 2357. 6232. 2.64
## 8 CNF03840 11379. 6188. 0.544
## 9 CNF02150 15472. 6159. 0.398
## 10 CNF03160 5121. 6125. 1.20
## # ... with 6,629 more rows
We also calculated hiTrans_JEC21, the top 5% (330) translated genes by RPF TPM.
## # A tibble: 6,639 x 19
## Gene aATG.context aATG.pos d1.context d1.posTSS d1.posATG d1.frame
## <chr> <chr> <int> <chr> <int> <int> <int>
## 1 CNA00… GACCCCCTTGTTA… 93 ATAGCTGGTC… 226 -133 1
## 2 CNA00… ATATTGCCTGAGA… 102 GTCCACCTTA… 163 -61 1
## 3 CNA00… GAACTATCAAGCA… 214 GAGGCTCCGC… 512 -298 1
## 4 CNA00… ATTTTCAACAGCA… 81 AGCAATATAC… 307 -226 1
## 5 CNA00… ACCGTGCACACCA… 76 GTATTCGGGG… 106 -30 0
## 6 CNA00… AATCATACCAAAA… 117 GCCCCTATCT… 186 -69 0
## 7 CNA00… CCGACTATAAAAA… 52 AACCGTGCTA… 112 -60 0
## 8 CNA00… CTTTCTCTTCAGA… 77 TGCTATAGCA… 98 -21 0
## 9 CNA00… TAATCACACAAGA… 330 CTCATCATCA… 391 -61 1
## 10 CNA00… AAAAAAAACGCGA… 146 ACTTGTCGAC… 184 -38 2
## # ... with 6,629 more rows, and 12 more variables: d2.context <chr>,
## # d2.posTSS <int>, d2.posATG <int>, d2.frame <int>, u1.context <chr>,
## # u1.posTSS <int>, u1.posATG <int>, u1.frame <int>, u2.context <chr>,
## # u2.posTSS <int>, u2.posATG <int>, u2.frame <int>
That’s for hiTrans_JEC21, the top 5% (330) translated genes by RPF TPM.
First upstream ATG.
First downstream ATG
Except for 3rd-codon-position bias.
Calculate motif score against the position weight matrix (pwm) for both narrow (-5 from ATG through to ATG) and wide (-9 from ATG to +3) kozak consensus motif. These motifs are taken from the top 5% highly translated genes.
Using the sequence logo, details on https://en.wikipedia.org/wiki/Sequence_logo
## # A tibble: 6 x 4
## Genes ATG Width Infon
## <chr> <chr> <chr> <dbl>
## 1 All aATG narrow 0.987
## 2 HiTrans aATG narrow 3.01
## 3 CytoRibo aATG narrow 4.24
## 4 All d1ATG narrow 0.144
## 5 HiTrans d1ATG narrow 0.340
## 6 CytoRibo d1ATG narrow 0.561
Information content in bits of highly-translated consensus (excluding 6 bits from ATG), narrow is 3.01, of wide is 4.71.
We calculate scores using Biostrings::PWMscoreStartingAt.
The best description I could find of this method is: https://support.bioconductor.org/p/61520/
It is just the sum of the matrix product of the PWM with the sequence.
Write scores to file scores_kozak_JEC21.txt.
## # A tibble: 6,639 x 11
## Gene aATG.scorekn d1.scorekn u1.scorekn aATG.scorekw d1.scorekw
## <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 CNA00010 0.737 0.723 0.968 0.662 0.707
## 2 CNA00020 0.814 0.934 0.868 0.735 0.892
## 3 CNA00030 0.875 0.698 0.804 0.874 0.689
## 4 CNA00040 0.887 0.802 NA 0.863 0.772
## 5 CNA00050 0.955 0.707 NA 0.839 0.707
## 6 CNA00060 1.000 0.889 NA 0.936 0.792
## 7 CNA00070 0.977 0.809 NA 0.893 0.670
## 8 CNA00075 0.791 0.922 0.735 0.799 0.835
## 9 CNA00080 0.933 0.923 NA 0.890 0.886
## 10 CNA00090 0.848 0.781 NA 0.875 0.699
## # ... with 6,629 more rows, and 5 more variables: u1.scorekw <dbl>,
## # d1vsan <dbl>, u1vsan <dbl>, d1vsaw <dbl>, u1vsaw <dbl>
Red: high dATG vs aATG Kozak score. Blue: highly translated. Purple: both.
R = -0.058
Those genes are in this list:
## # A tibble: 330 x 3
## Gene aATG.scorekn d1.scorekn
## <chr> <dbl> <dbl>
## 1 CNI00340 0.675 0.967
## 2 CNK00900 0.708 0.999
## 3 CNA01530 0.675 0.956
## 4 CNI00670 0.710 0.990
## 5 CNB01880 0.625 0.894
## 6 CNK02980 0.700 0.968
## 7 CNL05790 0.642 0.909
## 8 CNA07070 0.645 0.911
## 9 CNN00160 0.724 0.989
## 10 CNB00790 0.736 0.999
## # ... with 320 more rows
For top 3315 / 50% of genes by mean RNA TPM.
In input file JEC21_mitofates_26June2018.txt.
It’s just a subset: the dual-localized ones.
For top 3315 / 50% of genes by mean RNA TPM.
## # A tibble: 6,797 x 4
## # Groups: Gene [6,797]
## Gene RNA RPF TE
## <chr> <dbl> <dbl> <dbl>
## 1 CNAG_06125 10279. 20179. 1.96
## 2 CNAG_06101 8672. 8471. 0.977
## 3 CNAG_05762 7396. 7528. 1.02
## 4 CNAG_00779 3861. 7368. 1.91
## 5 CNAG_03127 6257. 7184. 1.15
## 6 CNAG_04011 13313. 6786. 0.510
## 7 CNAG_06222 6401. 6784. 1.06
## 8 CNAG_01455 12683. 6580. 0.519
## 9 CNAG_05525 6978. 6456. 0.925
## 10 CNAG_03739 6383. 6394. 1.00
## # ... with 6,787 more rows
We also calculated hiTrans_H99, the top 5% (330) translated genes by RPF TPM.
## # A tibble: 6,797 x 19
## Gene aATG.context aATG.pos d1.context d1.posTSS d1.posATG d1.frame
## <chr> <chr> <int> <chr> <int> <int> <int>
## 1 CNAG_… TACTTACGCGACA… 70 AAATTCACTT… 100 -30 0
## 2 CNAG_… GAACTTCGATCAA… 52 TCTCCCGCCA… 114 -62 2
## 3 CNAG_… GTAGACTTACCTA… 346 CACGGGCATC… 395 -49 1
## 4 CNAG_… CACATACGTAACA… 214 CCGAACGGCG… 256 -42 0
## 5 CNAG_… GACTATACAAAAA… 55 GGAGGTGGGC… 163 -108 0
## 6 CNAG_… AACCATACAAAAA… 99 CAAAGCCATT… 259 -160 1
## 7 CNAG_… ACCGTGCACACCA… 75 GTATTCGGAA… 105 -30 0
## 8 CNAG_… GTTTTCAACAGCA… 73 CCCATCAGAA… 380 -307 1
## 9 CNAG_… GTACTATTGAACA… 206 GAGGCTCCGC… 513 -307 1
## 10 CNAG_… TACAAGCTTGAAA… 90 GGCCGCCTTA… 151 -61 1
## # ... with 6,787 more rows, and 12 more variables: d2.context <chr>,
## # d2.posTSS <int>, d2.posATG <int>, d2.frame <int>, u1.context <chr>,
## # u1.posTSS <int>, u1.posATG <int>, u1.frame <int>, u2.context <chr>,
## # u2.posTSS <int>, u2.posATG <int>, u2.frame <int>
That’s for hiTrans_H99, the top 5% (330) translated genes by RPF TPM.
Ideally would fix this more nicely.
First upstream ATG.
First downstream ATG
Except for 3rd-codon-position bias.
Calculate motif score against the position weight matrix (pwm) for both narrow (-5 from ATG through to ATG) and wide (-9 from ATG to +3) kozak consensus motif. These motifs are taken from the top 5% highly translated genes.
Using the sequence logo details on https://en.wikipedia.org/wiki/Sequence_logo
## # A tibble: 6 x 4
## Genes ATG Width Infon
## <chr> <chr> <chr> <dbl>
## 1 All aATG narrow 0.550
## 2 HiTrans aATG narrow 3.12
## 3 CytoRibo aATG narrow 4.38
## 4 All d1ATG narrow 0.130
## 5 HiTrans d1ATG narrow 0.267
## 6 CytoRibo d1ATG narrow 0.501
Information content in bits of highly-translated consensus (excluding 6 bits from ATG), narrow is 3.12, of wide is 4.9.
Write scores to file scores_kozak_H99.txt.
## # A tibble: 6,797 x 11
## Gene aATG.scorekn d1.scorekn u1.scorekn aATG.scorekw d1.scorekw
## <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 CNAG_00002 0.866 0.784 0.853 0.834 0.687
## 2 CNAG_00003 0.833 0.847 NA 0.792 0.808
## 3 CNAG_00004 0.794 0.795 NA 0.759 0.665
## 4 CNAG_00005 0.880 0.752 0.855 0.892 0.659
## 5 CNAG_00006 0.978 0.727 NA 0.872 0.624
## 6 CNAG_00007 0.978 0.708 NA 0.916 0.662
## 7 CNAG_00008 0.960 0.819 NA 0.847 0.798
## 8 CNAG_00009 0.876 0.798 NA 0.864 0.783
## 9 CNAG_00010 0.896 0.796 0.822 0.849 0.756
## 10 CNAG_00011 0.878 0.937 0.691 0.743 0.862
## # ... with 6,787 more rows, and 5 more variables: u1.scorekw <dbl>,
## # d1vsan <dbl>, u1vsan <dbl>, d1vsaw <dbl>, u1vsaw <dbl>
R = -0.046
Those genes are in this list:
## # A tibble: 330 x 3
## Gene aATG.scorekn d1.scorekn
## <chr> <dbl> <dbl>
## 1 CNAG_07473 0.606 0.969
## 2 CNAG_04147 0.644 0.984
## 3 CNAG_04764 0.641 0.948
## 4 CNAG_07801 0.675 0.978
## 5 CNAG_02259 0.676 0.978
## 6 CNAG_03953 0.692 0.991
## 7 CNAG_07776 0.700 0.993
## 8 CNAG_06278 0.699 0.991
## 9 CNAG_00165 0.675 0.957
## 10 CNAG_04179 0.709 0.991
## # ... with 320 more rows
For top 3315 / 50% of genes by mean RNA TPM.
In input file H99_mitofates_26June2018.txt.
It’s just a subset: the dual-localized ones.
For top 3315 / 50% of genes by mean RNA TPM.
From 2016 Paper.
## # A tibble: 6,341 x 2
## H99 JEC21
## <chr> <chr>
## 1 CNAG_01397 CND05080
## 2 CNAG_07825 CNH03545
## 3 CNAG_05539 CNH01890
## 4 CNAG_03635 CNB01365
## 5 CNAG_06621 CNF03970
## 6 CNAG_00830 CNA08090
## 7 CNAG_07556 CNK01100
## 8 CNAG_06796 CNB00060
## 9 CNAG_06009 CNM00180
## 10 CNAG_03522 CNG00710
## # ... with 6,331 more rows
## # A tibble: 20 x 8
## H99 JEC21 RNA.H99 RPF.H99 TE.H99 RNA.JEC21 RPF.JEC21 TE.JEC21
## <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 CNAG_06125 CNM01300 10279. 20179. 1.96 4001. 18299. 4.57
## 2 CNAG_06101 CNM01080 8672. 8471. 0.977 8388. 8973. 1.07
## 3 CNAG_00779 CNA07570 3861. 7368. 1.91 5785. 7163. 1.24
## 4 CNAG_03127 CNG04360 6257. 7184. 1.15 3188. 7062. 2.22
## 5 CNAG_05762 CNF02150 7396. 7528. 1.02 15472. 6159. 0.398
## 6 CNAG_03739 CNB02360 6383. 6394. 1.00 3708. 6972. 1.88
## 7 CNAG_06222 CNM02240 6401. 6784. 1.06 4442. 6022. 1.36
## 8 CNAG_00655 CNA06350 12493. 6052. 0.484 15095. 6604. 0.437
## 9 CNAG_04011 CNB04930 13313. 6786. 0.510 19956. 5660. 0.284
## 10 CNAG_06633 CNF03840 8926. 6146. 0.689 11379. 6188. 0.544
## 11 CNAG_01332 CND04480 5928. 6096. 1.03 4661. 5987. 1.28
## 12 CNAG_03015 CNC00700 4860. 5704. 1.17 2357. 6232. 2.64
## 13 CNAG_04448 CNI01090 6660. 5962. 0.895 5374. 5900. 1.10
## 14 CNAG_00771 CNA07490 6830. 5879. 0.861 6812. 5968. 0.876
## 15 CNAG_00640 CNA06200 7706. 5784. 0.751 4908. 6048. 1.23
## 16 CNAG_04883 CNJ03110 4275. 5882. 1.38 5777. 5917. 1.02
## 17 CNAG_04726 CNJ01560 7970. 6377. 0.800 6365. 5406. 0.849
## 18 CNAG_00672 CNA06500 9202. 6072. 0.660 14347. 5654. 0.394
## 19 CNAG_05525 CNH01770 6978. 6456. 0.925 4071. 5212. 1.28
## 20 CNAG_03780 CNB02750 6631. 5710. 0.861 5014. 5892. 1.18
## # A tibble: 20 x 8
## H99 JEC21 RNA.H99 RPF.H99 TE.H99 RNA.JEC21 RPF.JEC21 TE.JEC21
## <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 CNAG_01130 CND02530 47.7 301. 6.31 32.7 298. 9.11
## 2 CNAG_01890 CNK02310 248. 1447. 5.84 280. 1905. 6.79
## 3 CNAG_06150 CNM01520 607. 3357. 5.53 540. 2932. 5.43
## 4 CNAG_02994 CNC06020 68.6 262. 3.81 31.3 219. 6.98
## 5 CNAG_01750 CNC02520 257. 1309. 5.10 312. 1659. 5.31
## 6 CNAG_04327 CNI02220 44.7 202. 4.52 34.8 197. 5.66
## 7 CNAG_01727 CNC02320 737. 3578. 4.86 710. 3755. 5.29
## 8 CNAG_01744 CNC02470 104. 316. 3.04 33.7 238. 7.05
## 9 CNAG_01117 CND02420 437. 2100. 4.81 441. 2279. 5.17
## 10 CNAG_05907 CNF00650 94.5 298. 3.15 66.5 451. 6.78
## 11 CNAG_04640 CNJ00800 217. 823. 3.80 159. 971. 6.09
## 12 CNAG_04313 CNI02360 205. 225. 1.10 31.0 254. 8.20
## 13 CNAG_07373 CNA06000 65.8 302. 4.59 78.2 355. 4.53
## 14 CNAG_05602 CNH02450 382. 676. 1.77 27.6 192. 6.97
## 15 CNAG_06840 CND06220 1197. 2883. 2.41 463. 2900. 6.27
## 16 CNAG_00136 CNA01230 46.4 197. 4.24 45.6 200. 4.39
## 17 CNAG_05884 CNF00890 79.8 295. 3.69 73.2 361. 4.93
## 18 CNAG_06208 CNM02070 251. 977. 3.89 232. 988. 4.26
## 19 CNAG_00992 CND01200 254. 891. 3.51 263. 1170. 4.45
## 20 CNAG_04659 CNJ00950 25.8 55.7 2.16 26.4 152. 5.78
To-do: Check which of these have uATGs.
## # A tibble: 20 x 8
## H99 JEC21 RNA.H99 RPF.H99 TE.H99 RNA.JEC21 RPF.JEC21 TE.JEC21
## <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 CNAG_07888 CNH025… 2649. 2.58 9.75e-4 582. 0.470 0.000806
## 2 CNAG_07695 CNF003… 164. 5.72 3.48e-2 183. 3.21 0.0176
## 3 CNAG_03140 CNG042… 188. 2.03 1.08e-2 124. 5.51 0.0442
## 4 CNAG_05574 CNH022… 30.1 2.58 8.59e-2 42.6 1.67 0.0391
## 5 CNAG_04855 CNJ027… 30.4 2.66 8.75e-2 88.2 6.70 0.0760
## 6 CNAG_06614 CNF040… 41.8 4.48 1.07e-1 58.1 4.58 0.0789
## 7 CNAG_01603 CNC011… 52.5 3.41 6.49e-2 25.0 3.04 0.122
## 8 CNAG_02323 CNE022… 39.1 3.34 8.52e-2 50.2 5.12 0.102
## 9 CNAG_03578 CNG002… 43.5 6.10 1.40e-1 58.6 4.37 0.0745
## 10 CNAG_07813 CNL049… 148. 20.0 1.35e-1 204. 17.4 0.0850
## 11 CNAG_06246 CNM024… 196. 24.6 1.25e-1 172. 17.0 0.0990
## 12 CNAG_05319 CNH031… 35.4 0.588 1.66e-2 35.6 7.71 0.217
## 13 CNAG_00784 CNA076… 52.8 6.67 1.26e-1 50.1 5.86 0.117
## 14 CNAG_08027 CNH020… 25.6 2.64 1.03e-1 90.3 13.6 0.151
## 15 CNAG_02433 CNE012… 39.1 6.71 1.72e-1 115. 11.1 0.0965
## 16 CNAG_00529 CNA051… 40.2 7.94 1.98e-1 97.3 7.85 0.0806
## 17 CNAG_05237 CNL039… 33.0 9.10 2.76e-1 72.3 0.278 0.00384
## 18 CNAG_05288 CNH034… 56.3 8.89 1.58e-1 70.4 8.85 0.126
## 19 CNAG_01624 CNC013… 34.4 5.03 1.46e-1 30.5 4.37 0.143
## 20 CNAG_02867 CNC048… 54.6 7.30 1.34e-1 47.8 7.51 0.157
We take transcripts where the overall gene expression (RNA abundance in top 50%), the difference in score (dATG > aATG in top 5%), and the dATG frame are all conserved between H99 and JEC21.
Saved to file dvsaATG_highdiffn_inframe_cc.txt.
## # A tibble: 44 x 6
## H99 JEC21 a.skn.H99 d.skn.H99 a.skn.JEC21 d.skn.JEC21
## <chr> <chr> <dbl> <dbl> <dbl> <dbl>
## 1 CNAG_07473 CNB01880 0.606 0.969 0.625 0.894
## 2 CNAG_07776 CNI00670 0.700 0.993 0.710 0.990
## 3 CNAG_00165 CNA01530 0.675 0.957 0.675 0.956
## 4 CNAG_02545 CNE00210 0.666 0.940 0.682 0.943
## 5 CNAG_07801 CNL06190 0.675 0.978 0.685 0.904
## 6 CNAG_01544 CNC06400 0.727 0.978 0.723 0.978
## 7 CNAG_03953 CNB04410 0.692 0.991 0.741 0.947
## 8 CNAG_04179 CNI03160 0.709 0.991 0.725 0.947
## 9 CNAG_05722 CNF02520 0.668 0.912 0.675 0.918
## 10 CNAG_02431 CNE01260 0.727 0.999 0.731 0.943
## 11 CNAG_02880 CNC04930 0.666 0.912 0.682 0.911
## 12 CNAG_03996 CNB04810 0.666 0.899 0.625 0.848
## 13 CNAG_00517 CNA04990 0.732 0.934 0.702 0.942
## 14 CNAG_02259 CNE02870 0.676 0.978 0.695 0.835
## 15 CNAG_03396 CNG01890 0.632 0.861 0.642 0.854
## 16 CNAG_00086 CNA00760 0.736 0.957 0.742 0.956
## 17 CNAG_07873 CNH00360 0.733 0.948 0.749 0.946
## 18 CNAG_04219 CNI03610 0.779 0.978 0.771 0.978
## 19 CNAG_04604 CNJ00430 0.789 0.990 0.797 0.990
## 20 CNAG_00026 CNA00190 0.726 0.957 0.720 0.877
## # ... with 24 more rows
Saved to file dvsaATG_highdiffn_outframe_cc.txt.
## # A tibble: 14 x 6
## H99 JEC21 a.skn.H99 d.skn.H99 a.skn.JEC21 d.skn.JEC21
## <chr> <chr> <dbl> <dbl> <dbl> <dbl>
## 1 CNAG_06278 CNN00160 0.699 0.991 0.724 0.989
## 2 CNAG_04054 CNB05380 0.717 0.978 0.723 0.978
## 3 CNAG_02894 CNC05065 0.783 0.978 0.798 0.978
## 4 CNAG_06006 CNM00150 0.715 0.934 0.781 0.933
## 5 CNAG_02809 CNC04270 0.801 0.978 0.792 0.978
## 6 CNAG_03008 CNC06190 0.827 0.999 0.817 1
## 7 CNAG_03370 CNG02120 0.803 0.969 0.795 0.968
## 8 CNAG_01667 CNC01780 0.842 0.990 0.837 0.990
## 9 CNAG_00784 CNA07610 0.692 0.842 0.696 0.837
## 10 CNAG_02578 CNK00690 0.826 0.945 0.791 0.944
## 11 CNAG_01241 CND03590 0.757 0.883 0.761 0.897
## 12 CNAG_07780 CNI00090 0.876 0.999 0.867 1
## 13 CNAG_01270 CND03900 0.668 0.791 0.675 0.791
## 14 CNAG_03839 CNB03280 0.824 0.941 0.822 0.942
Filtered for enough RNA (top 50%)
## # A tibble: 120 x 6
## H99 JEC21 a.skn.H99 d.skn.H99 a.skn.JEC21 d.skn.JEC21
## <chr> <chr> <dbl> <dbl> <dbl> <dbl>
## 1 CNAG_04147 CNI02850 0.644 0.984 0.946 0.890
## 2 CNAG_06000 CNM00090 0.686 0.913 0.966 0.757
## 3 CNAG_03486 CNG01060 0.703 0.967 0.966 0.778
## 4 CNAG_06196 CNM01950 0.732 0.634 0.990 0.743
## 5 CNAG_06353 CNN00820 0.658 0.904 0.904 0.837
## 6 CNAG_06446 CNN01710 0.764 0.999 1 0.628
## 7 CNAG_01092 CND02180 0.689 0.925 0.922 0.877
## 8 CNAG_01188 CND03130 0.651 0.791 0.883 0.732
## 9 CNAG_05678 CNF02960 0.689 0.910 0.917 0.932
## 10 CNAG_07645 CNE03115 0.660 0.754 0.887 0.685
## # ... with 110 more rows
## # A tibble: 120 x 6
## H99 JEC21 a.skn.H99 d.skn.H99 a.skn.JEC21 d.skn.JEC21
## <chr> <chr> <dbl> <dbl> <dbl> <dbl>
## 1 CNAG_00529 CNA05110 0.948 0.811 0.702 0.887
## 2 CNAG_03410 CNG01740 0.990 0.741 0.754 0.990
## 3 CNAG_04089 CNB05680 0.967 0.817 0.742 0.891
## 4 CNAG_04751 CNJ01820 0.960 0.860 0.737 0.922
## 5 CNAG_04899 CNJ03230 0.948 0.783 0.736 0.968
## 6 CNAG_02703 CNK01910 0.894 0.747 0.696 0.883
## 7 CNAG_03638 CNB01390 0.909 0.793 0.718 0.666
## 8 CNAG_05504 CNH01580 0.967 0.819 0.786 0.849
## 9 CNAG_05692 CNF02820 0.923 0.785 0.745 0.849
## 10 CNAG_07609 CNC03180 0.967 0.890 0.790 0.838
## # ... with 110 more rows
Many of these have the expected structure where homologs differ only at the N-terminus. There appears to be a swap between a near-ATG start codon, and a poor-context ATG, between the species.
Higher aATG score in JEC21:
Higher aATG score in H99:
These look like mostly misannotated in one strain, or not interesting. Is the upstream start codon in one strain actually used? Check for ribosome footprints and for other features (homology, mito localization seq). It would be nice to have an additional filter here.
This was done on 25th June, with values generated by CryptoATGcontext then. Not a reproducible analysis here!
I performed GO analysis with PANTHER.db on JEC21 gene names. PANTHER version 13.1 Released 2018-02-03, Overrepresentation test on GOslim terms.
Link: http://www.pantherdb.org/tools/compareToRefList.jsp
File dvsaATG_highdiffn_outframe_cc.txt.
No significant GO terms.
File dvsaATG_highdiffn_inframe_cc.txt.
Enriched in Biological processes:
Molecular Function:
Cellular Component:
File hiTrans_cc.txt.
Enriched BPs include:
Enriched MFs include:
Enriched CCs include:
File hiTE_cc.txt.
Enriched BPs include:
Enriched MFs include:
Enriched CCs include:
File loTE_cc.txt.
Enriched BPs include:
Enriched MFs, no sig. results.
Enriched CCs, no sig. results.